| Process Name | Brief Description | Code Syntax |
|---|---|---|
| One vs One classifier (using logistic regression) |
Process: This method trains one classifier for each pair of classes. Key hyperparameters: - `estimator`: Base classifier (e.g., logistic regression) Pros: Can work well for small datasets. Cons: Computationally expensive for large datasets. Common applications: Multiclass classification problems where the number of classes is relatively small. |
|
| One vs All classifier (using logistic regression) |
Process: Trains one classifier per class, where each classifier distinguishes between one class and the rest. Key hyperparameters: - `estimator`: Base classifier (e.g., Logistic Regression) - `multi_class`: Strategy to handle multiclass classification (`ovr`) Pros: Simpler and more scalable than One vs One. Cons: Less accurate for highly imbalanced classes. Common applications: Common in multiclass classification problems such as image classification. |
or
|
| Decision tree classifier |
Process: A tree-based classifier that splits data into smaller subsets based on feature values. Key hyperparameters: - `max_depth`: Maximum depth of the tree Pros: Easy to interpret and visualize. Cons: Prone to overfitting if not pruned properly. Common applications: Classification tasks, such as credit risk assessment. |
|
| Decision tree regressor |
Process: Similar to the decision tree classifier, but used for regression tasks to predict continuous values. Key hyperparameters: - `max_depth`: Maximum depth of the tree Pros: Easy to interpret, handles nonlinear data. Cons: Can overfit and perform poorly on noisy data. Common applications: Regression tasks, such as predicting housing prices. |
|
| Linear SVM classifier |
Process: A linear classifier that finds the optimal hyperplane separating classes with a maximum margin. Key hyperparameters: - `C`: Regularization parameter - `kernel`: Type of kernel function (`linear`, `poly`, `rbf`, etc.) - `gamma`: Kernel coefficient (only for `rbf`, `poly`, etc.) Pros: Effective for high-dimensional spaces. Cons: Not ideal for nonlinear problems without kernel tricks. Common applications: Text classification and image recognition. |
|
| K-nearest neighbors classifier |
Process: Classifies data based on the majority class of its nearest neighbors. Key hyperparameters: - `n_neighbors`: Number of neighbors to use - `weights`: Weight function used in prediction (`uniform` or `distance`) - `algorithm`: Algorithm used to compute the nearest neighbors (`auto`, `ball_tree`, `kd_tree`, `brute`) Pros: Simple and effective for small datasets. Cons: Computationally expensive as the dataset grows. Common applications: Recommendation systems, image recognition. |
|
| Random Forest regressor |
Process: An ensemble method using multiple decision trees to improve accuracy and reduce overfitting. Key hyperparameters: - `n_estimators`: Number of trees in the forest - `max_depth`: Maximum depth of each tree Pros: Less prone to overfitting than individual decision trees. Cons: Model complexity increases with the number of trees. Common applications: Regression tasks such as predicting sales or stock prices. |
|
| XGBoost regressor |
Process: A gradient boosting method that builds trees sequentially to correct errors from previous trees. Key hyperparameters: - `n_estimators`: Number of boosting rounds - `learning_rate`: Step size to improve accuracy - `max_depth`: Maximum depth of each tree Pros: High accuracy and works well with large datasets. Cons: Computationally intensive, complex to tune. Common applications: Predictive modeling, especially in Kaggle competitions. |
|
| Method Name | Brief Description | Code Syntax |
|---|---|---|
| OneHotEncoder | Transforms categorical features into a one-hot encoded matrix. |
|
| accuracy_score | Computes the accuracy of a classifier by comparing predicted and true labels. |
|
| LabelEncoder | Encodes labels (target variable) into numeric format. |
|
| plot_tree | Plots a decision tree model for visualization. |
|
| normalize | Scales each feature to have zero mean and unit variance (standardization). |
|
| compute_sample_weight | Computes sample weights for imbalanced datasets. |
|
| roc_auc_score | Computes the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) for binary classification models. |
|